Search CORE

6 research outputs found

VComputeBench: A Vulkan Benchmark Suite for GPGPU on Mobile and Embedded GPUs

Author: Juurlink Ben
Mammeri Nadjib
Publication venue
Publication date: 19/10/2018
Field of study

GPUs have become immensely important computational units on embedded and mobile devices. However, GPGPU developers are often not able to exploit the compute power offered by GPUs on these devices mainly due to the lack of support of traditional programming models such as CUDA and OpenCL. The recent introduction of the Vulkan API provides a new programming model that could be explored for GPGPU computing on these devices, as it supports compute and promises to be portable across different architectures. In this paper we propose VComputeBench, a set of benchmarks that help developers understand the differences in performance and portability of Vulkan. We also evaluate the suitability of Vulkan as an emerging cross-platform GPGPU framework by conducting a thorough analysis of its performance compared to CUDA and OpenCL on mobile as well as on desktop platforms. Our experiments show that Vulkan provides better platform support on mobile devices and can be regarded as a good crossplatform GPGPU framework. It offers comparable performance and with some low-level optimizations it can offer average speedups of 1.53x and 1.66x compared to CUDA and OpenCL respectively on desktop platforms and 1.59x average speedup compared to OpenCL on mobile platforms. However, while Vulkan’s low-level control can enhance performance, it requires a significantly higher programming effort.EC/H2020/688759/EU/Low-Power Parallel Computing on GPUs 2/LPGPU

DepositOnce

Crossref

Approximating Memory-bound Applications on Mobile GPUs

Author: Cosenza Biagio
Juurlink Ben
Maier Daniel
Mammeri Nadjib
Publication venue
Publication date: 01/01/2019
Field of study

Accepted for 2019 International Conference on High Performance Computing & Simulation (HPCS)Approximate computing techniques are often used to improve the performance of applications that can tolerate some amount of impurity in the calculations or data. In the context of embedded and mobile systems, a broad number of applications have exploited approximation techniques to improve performance and overcome the limited capabilities of the hardware. On such systems, even small performance improvements can be sufficient to meet scheduled requirements such as hard real-time deadlines. We study the approximation of memory-bound applications on mobile GPUs using kernel perforation, an approximation technique that exploits the availability of fast GPU local memory to provide high performance with more accurate results. Using this approximation technique, we approximated six applications and evaluated them on two mobile GPU architectures with very different memory layouts: a Qualcomm Adreno 506 and an ARM Mali T860 MP2. Results show that, even when the local memory is not mapped to dedicated fast memory in hardware, kernel perforation is still capable of 1.25x speedup because of improved memory layout and caching effects. Mobile GPUs with local memory show a speedup of up to 1.38x

DepositOnce

Crossref

Archivio della Ricerca - Università di Salerno

The LPGPU2 Project: Low-Power Parallel Computing on GPUs : Extended Abstract

Author: Bliss Martyn
Juurlink Ben
Keramidas Georgios
Kokkala Chrysa
Lucas Jan
Mammeri Nadjib
Richards Andrew
Publication venue
Publication date: 01/01/2017
Field of study

The LPGPU2 project is a 30-month-project (Innovation Action) funded by the European Union. Its overall goal is to develop an analysis and visualization framework that enables GPU application developers to improve the performance and power consumption of their applications. To achieve this overall goal, several key objectives need to be achieved. First, several applications (use cases) need to be developed for or ported to low-power GPUs. Thereafter, these applications need to be optimized using the tooling framework. In addition, power measurement devices and power models need to be developed that are 10x more accurate than the state of the art. The project consortium actively promotes open vendor-neutral standards via the Khronos group. This paper briefly reports on the achievements made in the first half of the project, and focuses on the progress made in applications; in power measurement, estimation, and modelling; and in the analysis and visualization tool suite.EC/H2020/688759/EU/Low-Power Parallel Computing on GPUs 2/LPGPU

DepositOnce

Enabling GPU software developers to optimize their applications – The LPGPU2approach

Author: Aransay Ignacio
Bliss Martyn
Juurlink Ben
Keramidas Georgios
Kokkala Chrysa
Lucas Jan
Mammeri Nadjib
Pontzolkova Katerina
Richards Andrew
Publication venue
Publication date: 01/01/2017
Field of study

Low-power GPUs have become ubiquitous, they can be found in domains ranging from wearable and mobile computing to automotive systems. With this ubiquity has come a wider range of applications exploiting low-power GPUs, placing ever increasing demands on the expected performance and power efficiency of the devices. The LPGPU 2 project is an EU-funded, Innovation Action, 30-month-project targeting to develop an analysis and visualization framework that enables GPU application developers to improve the performance and power consumption of their applications. To this end, the project follows a holistic approach. First, several applications (use cases) are being developed for or ported to low-power GPUs. These applications will be optimized using the tooling framework in the last phase of the project. In addition, power measurement devices and power models are devised that are 10× more accurate than the state of the art. The ultimate goal of the project is to promote open vendor-neutral standards via the Khronos group. This paper briefly reports on the achievements made in the first phase of the project (till month 18) and focuses on the progress made in applications; in power measurement, estimation, and modelling; and in the analysis and visualization tool suite.EC/H2020/688759/EU/Low-Power Parallel Computing on GPUs 2/LPGPU

DepositOnce

VComputeBench: A Vulkan Benchmark Suite for GPGPU on Mobile and Embedded GPUs

Author: Mammeri Nadjib
Publication venue
Publication date: 02/11/2018
Field of study

DepositOnce

Leistungszähler-basierte Leistungsmodellierung von Mobile GPUs mit Deep Learning

Author: Juurlink Ben
Lal Sohan
Mammeri Nadjib
Neu Markus
Publication venue
Publication date: 15/07/2019
Field of study

GPUs have recently become important computational units on mobile devices, resulting in heterogeneous devices that can run a variety of parallel processing applications. While developing and optimizing such applications, estimating power consumption is of immense importance as energy efficiency has become the key design constraint to optimize for on these platforms. In this work, we apply deep learning techniques in building a predictive model for estimating the power consumption of parallel applications on a heterogeneous mobile SoC. Our model is an artificial neural network (NN) trained using CPU and GPU hardware performance counters along with measured power data. The model is trained and evaluated with data collected using a set of graphics OpenGL workloads as well as OpenCL compute benchmarks. Our evaluations show that our model can achieve accurate power estimates with a mean relative error of 4.47% with respect to real power measurements. When compared to other models, our NN model is about 3.3x better than a statistical linear regression model and 2x better than a state-of-the-art NN model.GPUs sind in letzter Zeit zu wichtigen Recheneinheiten auf Mobilgeräten geworden, was zu heterogenen Geräten führt, auf denen eine Vielzahl von Parallelverarbeitungsanwendungen ausgeführt werden können. Bei der Entwicklung und Optimierung solcher Anwendungen ist die Schätzung des Stromverbrauchs von immenser Bedeutung, da die Energieeffizienz zu den wichtigsten Entwurfsbeschränkungen für die Optimierung auf diesen Plattformen geworden ist. In dieser Arbeit wenden wir Deep-Learning-Techniken an, um ein Vorhersagemodell zur Schätzung des Stromverbrauchs paralleler Anwendungen auf einem heterogenen mobilen SoC zu erstellen. Unser Modell ist ein künstliches neuronales Netzwerk (NN), das unter Verwendung von Leistungsindikatoren für CPU- und GPU-Hardware zusammen mit gemessenen Leistungsdaten trainiert wird. Das Modell wird mit Daten trainiert und ausgewertet, die mithilfe einer Reihe von OpenGL-Grafik-Workloads sowie OpenCL-Rechenbenchmarks erfasst wurden. Unsere Bewertungen zeigen, dass unser Modell genaue Leistungsschätzungen mit einem mittleren relativen Fehler von 4,47% in Bezug auf Wirkleistungsmessungen erzielen kann. Im Vergleich zu anderen Modellen ist unser NN-Modell etwa 3,3-mal besser als ein statistisches lineares Regressionsmodell und 2-mal besser als ein NN nach dem Stand der Technik Modell

DepositOnce